Folder hierarchy

input: functions to read and parse data (stored in .xlsx format) into appropriate format suitable for further analysis. Output is a table of input parameters along with column vector of assigned labels from q1.

random_sampling: functions to perform simple random sampling, and stratified random sampling by district and rbf_type groupings.

categ_conv: functions to perform one hot encoding and decoding or one-of-k scheme for categorical variables. 

kFoldCrossValidation: functions to perform k-fold cross validation by splitting input data set into training and validation sets. By default, 10-fold cross validation is performed. See readme in sub-directory for detailed information on included functions and usage.

perf_curves: functions to compute false positive rate, F1 score, precision, recall and AUC. Also included are functions to plot fpr vs. tpr (ROC curve), and precision vs. recall (precision-recall curve).  

setupML: function to work in conjunction with cross-validation scripts to apply described supervised learning techniques. These include training functions for Logistic Regression, Naive Bayes, Support Vector Machine and Random Forest methods. Once trained, predictML function serves as a generic container for prediction of class label of unseen instances using all ML methods.

tuneRF: functions to tune Random Forest using quantile error and bayesian optimization. Also included are functions to compute out-of-bag classification error for number of grown trees to optimize the mtry parameter as well as feature importance estimates.
